Motivation

We wanted to explore for possible links between civilian ownership of firearms and total homicides - not just firearm-related homicides.

This has been looked at before, but surprisingly most analysis does not control for income, which is a major confounding factor at the country level.

TODO review of blogs and literature that have looked at the same data.

Data

The indicators2005 data was developed for this purpose. Here’s the first five rows of that dataset:

region Pop2005 country worldbankincomegroup Alcohol Alpha_2 Alpha_3 FemaleSuicide MaleSuicide Suicide giniyear gini gnpyear GNPPerCapitaPPP homicideyear Homicide Average total all civilian firearms FirearmsPer100People2005 HDI rich
Europe 3833400 Bosnia and Herzegovina Upper-middle-income 9.6 BA BIH NA NA NA 2004 34.04 2005 6680 2009 1.9 675000 17.6083894 0.697 FALSE
Africa 4055600 Central African Republic Low-income 1.6 CF CAF NA NA NA 2003 43.61 2005 730 2012 13.2 40000 0.9862906 0.325 FALSE
Europe 1032600 Cyprus High-income 9.3 CY CYP NA NA NA 2005 30.26 2005 27060 2005 1.9 275000 26.6318032 0.830 FALSE
Americas 9237600 Dominican Republic Upper-middle-income 5.8 DO DOM NA NA NA 2005 49.96 2005 7380 2005 25.9 450000 4.8713952 0.676 FALSE
Africa 1377800 Gabon Upper-middle-income 7.9 GA GAB NA NA NA 2005 42.18 2005 13860 2012 9.4 190000 13.7901002 0.644 FALSE

We’re interested in the Homicide column, which was sourced from the World Development Indicators. This variable has a skewed distribution:

ggplot(indicators2005, aes(x = Homicide, fill = region, colour = region)) + 
  geom_density(alpha = 0.5) + 
  geom_rug() +
  ggtitle("Homicides per 100,000 population") +
  labs(caption = "Source: WDI")

Explain the firearms data.

Explain the HDI.

Pairwise relationships

There’s a very faint negative relationship between our main explanatory variable of interest, FirearmsPer100People2005, and homicide rates. This is clearer on the logarithmic scale. The USA is such an outlier for firearms ownership on the raw scale. The (TODO: which right wing outfit?) used this to claim that gun ownership makes a country safer.

p1 <- ggplot(indicators2005, aes(x = FirearmsPer100People2005, y = Homicide, label = Alpha_3)) +
  geom_smooth(method = "lm") +
  geom_text_repel(colour = "white") +
  geom_point() +
  labs(x = "Civilian firearms per 100 people", 
       y = "Homicide per 100,000")
p1

p1 + scale_x_log10() + scale_y_log10()

There’s a much stronger negative relationship between national average income and homicides. Richer countries have lower homicide rates:

ggplot(indicators2005, aes(x = GNPPerCapitaPPP, y = Homicide, label = Alpha_3)) +
  geom_smooth(method = "lm") +
  geom_text_repel(colour = "white") +
  geom_point() +
  labs(x = "Gross National Income per person, purchasing power parity measure", 
       y = "Homicide per 100,000") +
  scale_x_log10() +
  scale_y_log10()

And there’s a strongly positive relationship between inequality and homicide rates:

ggplot(indicators2005, aes(x = gini, y = Homicide, label = Alpha_3)) +
  geom_smooth(method = "lm") +
  geom_text_repel(colour = "white") +
  geom_point() +
  labs(x = "Economic inequality, Gini coefficient (higher number is more unequal)", 
       y = "Homicide per 100,000") +
  scale_y_log10()

Here’s a summary of the overall pair-wise relationships

indicators2005 %>%
  mutate(LogGNI = log(GNPPerCapitaPPP),
         LogFirearms = log(FirearmsPer100People2005),
         LogHomicide = log(Homicide)) %>%
  select(LogGNI, HDI, gini, LogFirearms, LogHomicide, Alcohol, rich) %>%
  mutate(rich = ifelse(rich, "Most developed", "Less developed")) %>%
  filter(!is.na(rich)) %>%
  ggpairs(mapping = ggplot2::aes(color = rich))

Modelling

I’m first going to fit a model with ordinary least squares because it has a high degree of familiarity and may give people confidence in what is to come. Then I’m going to use elastic net regularisation.

First I’m going to restrict mysefl to complete cases.

TODO - instead of doing this, do imputation, and integrate the imputation into the validation and confidence interval bootstraps.

HomicideData <- indicators2005 %>%
  dplyr::select(Homicide, gini, FirearmsPer100People2005, HDI, country, Alcohol)
HomicideData <- HomicideData[complete.cases(HomicideData), ]

Ordinary least squares

First the full model fit with OLS. I try two versions - one with an interaction between inequality and the HDI, and one without, and start with a global test. There’s evidence of an interaction:

model_full <- lm(log(Homicide) ~ gini * HDI + log(FirearmsPer100People2005) +  Alcohol, data = HomicideData)

model_no_interactions <- lm(log(Homicide) ~ gini + HDI + log(FirearmsPer100People2005) +  Alcohol, data = HomicideData)
anova(model_no_interactions, model_full)
## Analysis of Variance Table
## 
## Model 1: log(Homicide) ~ gini + HDI + log(FirearmsPer100People2005) + 
##     Alcohol
## Model 2: log(Homicide) ~ gini * HDI + log(FirearmsPer100People2005) + 
##     Alcohol
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)   
## 1    131 84.411                               
## 2    130 77.651  1    6.7591 11.316 0.00101 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Inference

Here’s the estimates of the various coefficients and their standard errors:

Dependent variable:
log(Homicide)
gini -0.071*
(0.041)
HDI -10.111***
(2.512)
log(FirearmsPer100People2005) 0.153**
(0.069)
Alcohol 0.011
(0.023)
gini:HDI 0.208***
(0.062)
Constant 5.325***
(1.657)
Observations 136
R2 0.498
Adjusted R2 0.478
Residual Std. Error 0.773 (df = 130)
F Statistic 25.755*** (df = 5; 130)
Note: p<0.1; p<0.05; p<0.01

The p-value for the positive firearms effect on homicides is 0.02 or 0.03 whether the interaction between inequality and income is in or out.

Best way to get a confidence interval for the size of the firearms effect is with a bootstrap.

myfunction <- function(x, i){
  the_data <- x[i, ]
  bm <- lm(log(Homicide) ~ gini * HDI + log(FirearmsPer100People2005) +  Alcohol, data = the_data)
  return(coef(bm)["log(FirearmsPer100People2005)"])
}

full_model_boot <- boot(HomicideData, myfunction, R = 5000)
bc <- boot.ci(full_model_boot, type = "bca")
bc
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 5000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = full_model_boot, type = "bca")
## 
## Intervals : 
## Level       BCa          
## 95%   ( 0.0300,  0.2771 )  
## Calculations and Intervals on Original Scale

With log transformation of both the response and explanatory variables, the coefficients are the % change in y for a 1% change in x. So we interpret this confidence interval a 1% increase in firearms held leads to between a 0.03% and 0.27% increase in homicides.

Diagnostics

All ok:

par(mfrow = c(2, 2))
plot(model_full)

Elastic net regularization

First we need to decide between the lasso (alpha = 1), ridge regression (alpha = 0), or between

library(glmnet)
X <- model.matrix(log(Homicide) ~ gini * HDI + log(FirearmsPer100People2005) +  Alcohol - 1, data = HomicideData)
Y <- log(HomicideData$Homicide)

alphas <- 0:50 / 50
res <- rep(0, length(alphas)) 

for(i in 1:length(alphas)){
    cvmod <- cv.glmnet(x = X, y = Y, alpha = alphas[i])  
    res[i] <- sqrt(min(cvmod$cvm))
}

plot(alphas, res) # so it doesn't matter much, but perhaps better towards 1 than zero?

Now we need a value for lambda which controls how much shrinkage done of the coefficients. Cross-validation still the best way to do this. We can get the best value for prediction, or the most aggressively shrinking value that gets prediction error within one standard error of the best.

cvmod <- cv.glmnet(x = X, y = Y, alpha = 0.9)
model_enr <- glmnet(x = X, y = Y, alpha = 0.9)

coefs <- round(cbind(
  original = coef(model_full),
  shrunk = coef(model_enr, s = cvmod$lambda.min),
  veryshrunk = coef(model_enr, s = cvmod$lambda.1se)
), 3)

colnames(coefs) <- c("original", "shrunk", "veryshrunk")
coefs
## 6 x 3 sparse Matrix of class "dgCMatrix"
##                               original shrunk veryshrunk
## (Intercept)                      5.325  4.497     -0.199
## gini                            -0.071 -0.051      0.052
## HDI                            -10.111 -8.833     -0.580
## log(FirearmsPer100People2005)    0.153  0.150      .    
## Alcohol                          0.011  0.007      .    
## gini:HDI                         0.208  0.176      .

So, the best performing model has Firearms as an explanatory variable, and is basically untouched by the elastic net regularization. But a more ruthless model, happy to trade off a bit of predictive power for simplicity, would only include inequality and level of development in predicting homicide levels.

Conclusion

There is evidence that higher civilian ownership of firearms in 2007 is related to higher overall homicide rates. However, the effect is not very large (about 0.16% increase in homicide for 1% increase in firearms owned), and is less important than either income/development (higher level of development leads to less homicides) or inequality (higher inequality leads to more homicides)